Approximate Regular Expression Searching with Arbitrary Integer Weights
نویسنده
چکیده
We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/ logk s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity of the best previous techniques, but it is simpler and faster in practice. In our way, we define a new recurrence for approximate searching where the current values depend only on previous values. Interestingly, our algorithm turns out to be a relevant option also for simple approximate string matching with arbitrary integer weights. ACM CCS
منابع مشابه
An approximate model for slug flow heat transfer in channels of arbitrary cross section
In this paper, a novel approximate solution to determine the Nusselt number for thermally developed, slug (low-prandtl), laminar, single phase flow in channels of arbitrary cross section is presented. Using the Saint-Venant principle in torsion of beams, it is shown that the thermally developed Nusselt number for low-prandtl flow is only a function of the geometrical parameters of the channel c...
متن کاملApproximate Regular Expression Matching
We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.
متن کاملLarge Text Searching Allowing Errors
We present a full inverted index for exact and approximate string matching in large texts. The index is composed of a table containing the vocabulary of words of the text and a list of positions in the text corresponding to each word. The size of the table of words is usually much less than 1% of the text size and hence can be kept in main memory, where most query processing takes place. The te...
متن کاملExpressing Context-Free Tree Languages by Regular Tree Grammars
In this thesis, three methods are investigated to express context-free tree languages by regular tree grammars. The first method is a characterization. We show restrictions to context-free tree grammars such that, for each restricted context-free tree grammar, a regular tree grammar can be constructed that induces the same tree language. The other two methods are approximations. An arbitrary co...
متن کاملGapped Suffix Arrays: a New Index Structure for Fast Approximate Matching
Approximate searching using an index is an important application in many fields. In this paper we introduce a new data structure called the gapped suffix array for approximate searching in the Hamming distance model. Building on the well known filtration approach for approximate searching, the use of the gapped suffix array can improve search speed by avoiding the merging of position lists.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nord. J. Comput.
دوره 11 شماره
صفحات -
تاریخ انتشار 2003